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ABSTRACT 

Objective Lumbar disc degeneration (LDD) is an 
important cause of low back pain, which is a common 
and costly problem. LDD is characterised by disc space 
narrowing and osteophyte growth at the circumference of 
the disc. To date, the agnostic search of the genome by 
genome-wide association (GWA) to identify common 
variants associated with LDD has not been fruitful. This 
study is the first GWA meta-analysis of LDD. 
Methods We have developed a continuous trait based 
on disc space narrowing and osteophytes growth which 
is measurable on all forms of imaging (plain radiograph, 
CT scan and MRI) and performed a meta-analysis of five 
cohorts of Northern European extraction each having 
GWA data imputed to HapMap V.2. 
Results This study of 4600 individuals identified four 
single nucleotide polymorphisms with p<5x10~ 8 , the 
threshold set for genome-wide significance. We identified 
a variant in the PARK2 gene (p=2.8x10~ 8 ) associated 
with LDD. Differential methylation at one CpG island of 
the PARK2 promoter was observed in a small subset 
of subjects (P=8.74x10 -4 , p=0.006). 
Conclusions LDD accounts for a considerable 
proportion of low back pain and the pathogenesis of 
LDD is poorly understood. This work provides evidence 
of association of the PARK2 gene and suggests that 
methylation of the PARK2 promoter may influence 
degeneration of the intervertebral disc. This gene has 
not previously been considered a candidate in LDD and 
further functional work is needed on this hitherto 
unsuspected pathway. 



INTRODUCTION 

Lumbar disc degeneration (LDD) is a common, 
age-related trait: 1 over a third of middle aged 
women have at least one degenerate disc. LDD 
contributes to low back pain 2 3 and as low back 
pain is common in the general population and 
costly to society, 4 LDD is of considerable public 
health importance. Discrete biochemical, histo- 
logical, metabolic and functional changes occur in 
LDD, such that discs become dehydrated, lose disc 
height and there is accompanying outgrowth of 
osteophytes from the vertebral body margin. 5 
There are similarities with peripheral joint 



osteoarthritis (OA). LDD has been shown to be 
heritable, with estimates of 65%-80% 6 7 and so a 
considerable proportion of the variance in LDD is 
explained by genetic factors. Yet to date, candidate 
gene studies have detected only a small number of 
convincing associations of genetic variants with 
LDD (reviewed by Ryder et al 8 ). A number of 
studies show conflicting results: these are likely 
due to small sample size or may reflect ethnic dif- 
ferences between Northern European and Asian 
populations, as seen in OA. 9 That some published 
genome-wide associations (GW\s) in common 
complex traits fail to replicate candidate gene find- 
ings suggests limitations to the candidate gene 
method. 10 As in other common complex traits, 11 a 
considerable proportion of the genetic variance in 
LDD remains unexplained. While GWA studies do 
not capture all variation in the genome, the 
approach does offer an agnostic search of the 
genome for variants associated with common 
complex traits. Their main limitation results from 
the inherent multiple testing in their design, 
meaning that power is lost and large samples are 
needed to address this. In order to optimise sample 
size in the present study we performed 
meta-analysis of GWA.S using a number of cohorts 
having the LDD phenotype. 

PATIENTS AND METHODS 

The cohorts available for inclusion in this study 
were all population samples, except Genetics of 
osteoARthrosis and Progression study (GARP) 
which specifically recruited participants having a 
diagnosis of OA. A variable was derived from mea- 
sures of disc height and osteophytes obtained from 
lateral images on MR, CT scan or plain radiograph. 
Summing this variable over the lumbar discs pro- 
vided a continuous measure of disc degeneration. 
GWA of this summary variable was performed by 
each individual study group and summary statis- 
tics were sent to and collated by KCL. 
Meta-analysis was performed of imputed GWA. 
data from five population cohorts (Framingham, 
GARP, Rotterdam study 1 and 3 and TwinsUK 
(TUK)) having imaging of the spine (see below). 
All cohorts had obtained fully informed consent 
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from their participants and appropriate ethics committee 
approval. In all studies, a cumulative degeneration score was 
constructed from the sum of scores of degenerative change at 
each level (disc space narrowing coded 0-3 and osteophytes, 
either anterior or posterior or both, coded 0-3). In those 
cohorts where only four disc levels were read (Framingham 
Heart study (FHS)), a fifth level was imputed by taking the 
mean reading for four discs as a surrogate for the fifth disc, and 
summing over five discs. The data underwent inverse normal 
transformation to generate a normally distributed variable. 

Phenotyping the cohorts 

The FHS is a longitudinal cohort of a defined population 
in Massachusetts, initiated in 1948 (http://www. 
framinghamheartstudy.org). It began as a study sample of 5209 
Framingham men and women between the ages of 30 and 60. 
Subsequently, offspring and third generation subjects were 
incorporated. Every other year, after an extensive baseline 
examination, subjects undergo testing that includes a medical 
history, blood profile, echocardiogram, and bone, eye and other 
tests. The subset of the Framingham subjects covered by the 
current analysis comprised 366 subjects from the Offspring and 
Generation three arms of the study who had undergone CT 
scanning of the spine, and the recruitment, conduct and specifi- 
cations of CT scanning have been reported elsewhere. 12 
Measurement of the lumbar spine CTs for disc height and 
scoring (0-3) for anterior and posterior osteophytes was per- 
formed by a spine specialist using the mid-sagittal plane at 
spinal levels L2-L3, L3-L4, L4-L5 and L5-S1 by author PS using 
the atlas of Jarosz et at 7 Using sagittal CT reformatting, the 
mid-sagittal plane was determined at each spinal level and mea- 
surements of disc height in millimetres were made. The mea- 
sured values for disc height (mm) were converted to 0-3 
categorical scale for disc height loss. Using an imputed value 
for the fifth lumbar disc based on the mean value of the mea- 
sured four discs, values for disc height loss and anterior and 
posterior osteophytes were summed over five lumbar disc 
levels. 

The GARP study comprises white sibling pairs of Dutch 
origin affected by OA at multiple sites and is aimed at identify- 
ing determinants of OA susceptibility and progression. 
Probands (ages 40-70 years) and their siblings had OA at mul- 
tiple joint sites of the hand or in two or more of the following 
joint sites: hand, spine (cervical or lumbar), knee or hip as 
described previously. 13 Subjects included in this study had 
undergone lateral radiographs of the spine (T4-S1). Each inter- 
vertebral disc level from Ll/2 to L5/S1 was reviewed for the 
presence and severity of osteophytes (anterior) and disc narrow- 
ing, using the Lane atlas 14 where 0=none; grade l=mild; grade 
2=moderate; and grade 3=severe. The score at each level for 
anterior osteophytes and disc height loss were summed over 
the five lumbar levels. 

The Rotterdam study is a prospective population-based 
follow-up study of the determinants and prognosis of chronic 
diseases in the elderly. 15 16 All persons living in Ommoord, a 
suburb of Rotterdam, who were aged 55 years and over were 
invited to participate. A total of 7983 participants were exam- 
ined. For the current analysis, two subsets of the data were 
considered. Rotterdam cohort 1 (RSI) consists of 2440 subjects; 
Rotterdam cohort 3 (RS3) consists of 974 subjects. Subjects ori- 
ginating from the Rotterdam study underwent plain radiog- 
raphy and scoring of LDD as previously described. 2 In brief, 
lateral lumbar radiographs were scored by a single observer for 
the presence of the individual radiographic features of disc 



degeneration. Each intervertebral disc from Ll/2 to L5/S1 was 
reviewed for the presence and severity of osteophytes (anterior) 
and disc narrowing using the Lane atlas as described above. 14 
The scores for the two traits over the five lumbar discs were 
summed. 

The TUK registry was described previously. 17 The register was 
started in 1993 and now comprises of approximately 10 000 
monozygotic and dizygotic adult Caucasian twins aged 
16-85 years from all over the UK, plus some parents and sib- 
lings. It now incorporates previous twin registries from the 
Institute of Psychiatry and Aberdeen University. This is a vol- 
unteer sample recruited by successive media campaigns 
without selecting for particular diseases or traits. All twins 
receive a series of detailed disease and environment question- 
naires. The majority of twins have been assessed in detail clin- 
ically at several time points for several hundred phenotypes 
related to common diseases or intermediate traits. The subset 
of TUK covered by the current analysis consisted of 744 sub- 
jects who had participated in the spine MR study (scanned 
1996-2000) using a Siemens MR machine with (Munich, 
Germany) 1.0-tesla superconducting magnet. Serial sagittal 
images of the cervical, thoraco-lumbar junction and lumbar 
spine (T9-L5) were obtained. 7 Images were coded for disc 
height loss and anterior osteophytes using a 0-3 scale in each 
case, where 0 is normal and 3 maximal degeneration as per the 
atlas of Jarosz et al? All five lumbar discs were scored and the 
scores summed to give a combined LDD variable. 

Genotyping and imputation 

FHS subjects were genotyped using Affymetrix GeneCHip 
Human Mapping 500 K array set (Affymetrix, Santa Clara, CA, 
USA) and/or the 100 K array set and/or the 50 K array. 
Methods and quality controls have been described previously. 18 

GARP subjects were genotyped using Illumina Human660W 
Quad BeadChips (HumanHap550v3, HumanHap610; Illumina, 
San Diego, CA, USA). Genotyping was performed at the geno- 
typing Rotterdam Genotyping Centre. Positive strand geno- 
types were called by clustering in Genome studio and 
imputation was performed using IMPUTE software and 
hapmap phase II V21. 19 20 Strict selection criteria were applied 
to the measured genotypes using a high information content 
(r 2 of >95%) and a minor allele frequency >0.0025. Association 
analyses were performed using an inhouse developed software 
package that allows the analyses of family data using all infor- 
mation available in the cases and controls by extending the 
Cochran-Armitage trend test. 21 

RSI and RS3 subjects in the Rotterdam Study sets were geno- 
typed on the HumanHap550v3 (RSI) or HumanHap610 (RS3) 
Genotyping BeadCHip (Illumina, San Diego, California, USA). 
The following sample quality control criteria were applied: 
sample call rate >97.5%, gender mismatch with typed X-linked 
markers, evidence for DNA contamination in the samples using 
the mean of the autosomal heterozygosity >0.33, exclusion of 
duplicates or first -degree relatives identified using Identity by 
State probabilities and exclusion of outliers (four SD away 
from the population mean using multidimensional scaling ana- 
lysis with four principal components). Filtering criteria for 
imputation are summarised in supplementary table SI. 

TUK subjects were genotyped using a combination of 
Illumina arrays (Human Hap300 and the Human Hap610Q). 
Genotyping was performed by the Wellcome Trust Sanger 
Institute using the Infinium assay (Illumina) across three 
genome-wide single nucleotide polymorphism (SNP) sets, as 
described previously. 22 Genotyping results had been sent to 
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KCL for collation and analysis using the statistical package, 
STATA (StataCorp) 23 Strict quality control was applied: 
314 075 SNPs were retained for analysis (98.7%); 733 were 
excluded because their call rates were <90% and 725 SNPs had 
minor allele frequency <0.01. In TUK, significant population 
substructure was excluded using the STRUCTURE program. 24 

GWA analysis 

All analyses were performed on inverse normal transformed 
summary LDD score as described above. Each study performed 
GWA analysis for LDD scores using either MACH2QTL (http:// 
www.sph.umich.edu/csg/abecasis/AAACH/index.html) (RSI 
and RS3) or SNPTEST (http://www.stats.ox.ac.uk/~marchini/ 
software/gwas/snptest.html) (GARP), which use genotype 
dosage value as continuous additive predictors of LDD score in 
a linear regression framework, or ProbABEL 25 using an additive 
genetic model while accounting for relatedness between the 
members of a family. Analysis of imputed genotype data 
accounted for uncertainty in each genotype prediction by using 
either the dosage information from MACH 26 or the genotype 
probabilities from IMPUTE. 19 

Meta-analysis of the five study groups 

Genotypes for 2.5-3 million autosomal SNPs were imputed sep- 
arately to increase coverage using HapMap V2 (http://www. 
hapmap.org) as the reference panel. In GARP and TUK, imput- 
ation was performed with IMPUTE V2 19 and in the other 
studies with MACH. 26 The common reference panel led to the 
reporting of results for the positive strand for all cohorts. In add- 
ition, allele pairs were compared between cohorts and no detect- 
able strand-flips were found; the minor allele frequency was also 
compared between datasets. The distributions of (3 values of the 
cohorts were found to be similar and therefore suitable for 
meta-analysis. All directly geno typed or imputed autosomal 
SNPs having information from more than one study group 
(n=2 552 511) were included in the meta-analysis. Association 
results were combined using inverse variance weighted fixed 
effects meta-analysis using PLINK VI. 06 (http://pngu.mgh. 
harvard.edu/purcell/plink/). Two meta-analyses were run: the 
first was unadjusted; the second was adjusted for age and sex as 
both are known risk factors for LDD and each risk factor was 
correlated with LDD in each study group. Heterogeneity of esti- 
mated effect was expressed using Q (weighted sum of squares) 
and I 2 (ratio of true heterogeneity to total observed variation). 
SNPs were excluded from the meta-analysis if the cohort-specific 
imputation quality, as assessed by r 2 (MACH) or Information 
Score (IMPUTE) metric, was <0.40. On this basis, one marker 
was excluded from the unadjusted association and one from the 
adjusted association. 



DNA methylation data and analysis 

Whole blood DNA methylation levels were obtained for 38 
individuals in the TUK cohort using the Illumina 
HumanMethylation27 DNA Analysis BeadChip assay as previ- 
ously described. 27 The sample included four monozygotic twin 
pairs, eight dizygotic twin pairs and 14 unrelated individuals. 
At each CpG site within an individual the methylation level 
was presented as p, which represents the ratio of intensity 
signal obtained from the methylated beads in the array over 
the sum of methylated and unmethylated bead signals. 
Following quality control checks, we obtained DNA methyla- 
tion at three CpG sites in the promoter region of the PARKZ 
gene within 2kb of the transcription start site. The three 
probes (cgl5832436, cg21926612 and cg24816866) mapped 
uniquely to the human genome (hgl8) within two mismatches 
(see Bell et al 2B ). We fitted linear mixed effects models to assess 
association between DNA methylation levels at the three CpG 
sites in the PARK2 promoter and LDD. We regressed the raw 
methylation levels on fixed-effect terms including methylation 
chip and LDD, and random-effect terms denoting family struc- 
ture and zygosity, and compared the association of differen- 
tially methylated regions with a null model, which excluded 
LDD from the fixed-effects terms. We also repeated the associ- 
ation analyses by normalising the methylation values at each 
CpG site to N(0, 1). 

RESULTS 

The study samples for the meta-analysis included 4683 indivi- 
duals of European ancestries. Table 1 shows sample size, demo- 
graphic characteristics, LDD and lumbar spine imaging method 
for each independent cohort. The majority of participants were 
female subjects (67.0%) and the samples had a mean age of 
57.7 years. Across the cohorts, the mean level of LDD varied 
from 0.011 to 3.46, reflecting differences in imaging methods. 
However, the variance of the LDD variables was broadly 
similar (range 0.958-1.14), as were the distributions of the esti- 
mated genetic effect sizes (p). The genotyping and imputation 
methods are shown in online supplementary table SI. 

Quantile-quantile plots for the unadjusted LDD GWA meta- 
analysis are presented in figure 1 (see online supplementary 
figure SI, adjusted). Test statistic inflation post meta-analysis, 
as measured by the genomic control statistic, 29 was low 
(Agc unadjusted=1.02; A C c adjusted=1.03). Results of the 
unadjusted and adjusted association analyses were broadly 
similar, with the p values of the adjusted analysis somewhat 
attenuated. A Manhattan plot for the unadjusted analysis is 
shown in figure 2 with numeric results in table 2 (unadjusted) 
and online supplementary table S2 (adjusted) for SNPs having 
p-clO" 5 . 



Table 1 Characteristics of the study samples 





FHS 


GARP 


RSI 


RS3 


TUK 


N 


330 


192 


2440 


974 


744 


Age (years) 


54.3 (11.0) 


60.3 (7.1) 


65.7 (6.7) 


54.7 (3.4) 


53.6 (8.3) 


Women (%) 


42.2 


79.7 


57 


59 


96.8 


BMI (kg/m 2 ) 


28.1 (5.1) 


26.44 (4.8) 


26.3 (3.4) 


27.12 (4.6) 


24.9 (4.4) 


Lumbar spine imaging 


CT 


Radiograph 


Radiograph 


Radiograph 


MRI 


LDD variable 


2.49 (0.97) 


0.02 (0.958) 


0.006 (0.978) 


0.011 (0.965) 


3.46 (1.14) 



Values are mean (SD) unless specified otherwise. 

BMI, body mass index; FHS, Framingham Heart Study; GARR Genetics of OsteoArthrosis and Progression study; LDD, Lumbar disc degeneration; RS1, Rotterdam study cohort 1; 
RS3, Rotterdam study cohort 3; TUK, TwinsUK. 
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Figure 1 Quantile— quantile plot of observed versus expected 
distribution of p values for the genome-wide association (GWA) 
meta-analysis. The plots show GWA meta-analysis quantile-quantile 
plot of observed against expected results, unadjusted for covariates. 



Four markers achieved genome-wide significance in the 
unadjusted GWAS, of which three were on chromosome 6 
(rs926849; rs2187689; rs7767277) and an intergenic marker on 
chromosome 3 (rs 17034687). The results of the meta-analysis 
adjusted for age and gender were broadly similar: the strongest 
signal was for SNP rs926849. This SNP lies on an intronic 
region of the Parkinson protein 2, E3 ubiquitin protein ligase 
(PARK2) gene on chromosome 6. A Forest plot of the groups' 
results and their meta-analysis is shown in figure 3. Data were 
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Figure 2 Manhattan plot for meta-analysis of unadjusted 
genome-wide association results. Plot shows combined results for the 
five studies included in the meta-analysis, unadjusted results. The blue 
and red horizontal lines mark the levels of suggestive and likely 
significance, respectively. 



available from four study groups and the range of estimated 
minor allele frequencies was 0.23-0.32. Imputation quality was 
high for all four studies contributing this SNP (>0.90, table 2) 
and the estimated between-study heterogeneity was acceptable 
(I 2 =0%; p(Q)=0.67). The results of a four-study meta-analysis 
which excluded GARP show the marker to have p=9.5e~ 8 
(unadjusted). The minor or C allele of rs926849 was associated 
with a lower level of LDD implying that the minor allele is pro- 
tective. Figure 4 shows a regional plot of both genotyped and 
imputed SNPs within 200 Kb of the PARK2 gene, along with 
recombination rates. 

Two of the other strongly associated SNPs are in perfect 
linkage disequilibrium (LD): rs2187689 and rs7767277 on 
chromosome 6. Data were available for four studies and the 
range of estimated allele frequency was 0.06-0.09. Imputation 
quality was high for all four studies (>0.90). Both SNPs are in 
strong LD (^=0.76) with an intronic marker on the prote- 
asome subunit, p type 9, large multifunctional peptidase 2 gene 
(PSMB9) that is located in the class II region of the major 
histocompatibility complex (MHC). Both genotyped and 
imputed SNPs within 400 Kb of rs2 187689, along with recom- 
bination rates, are shown in a regional plot in online 
supplementary figure S3. None of these top SNPs is in LD with 
known functional SNPs in either PARK2 or PSMB9. 

We tested for an association between LDD and DNA methy- 
lation variants at three CpG sites in the PARK2 promoter. 
A significant association between DNA methylation at CpG 
site cgl5832436 and LDD (p=8.74xl0" 4 , SE=2.49xlO" 4 , 
p=0.006) was observed. The pattern of hypermethylation with 
increasing LDD levels was reflected at the remaining promoter 
CpG sites; however, these did not reach nominal significance 
(cg21926612 p=0.003, p=0.14; cg24816866 p=6.76xl0" 4 , 
p=0.39). We repeated the analyses using normalised methyla- 
tion levels and observed that the association between 
cgl5832436 and LDD remained nominally significant. 

DISCUSSION 

GWA offers an unbiased scan of common genetic variants 
(minor allele frequency >5%) and thus may deliver novel var- 
iants in genes not hitherto suspected of playing a role in disc 
degeneration. This work is among the first to report on a 
genome-wide meta-analysis being conducted for LDD. LDD is 
an age-related process which occurs in all people to some 
extent and may be detected as early as the teenage years. 1 LDD 
is known to have genetic determinants 7 1 and its expression is 
also influenced by gender (women develop LDD later), body 
mass index 30-34 and smoking. 35 Occupational factors also play 
a small role in LDD. 36 37 LDD as determined by MRI has been 
implicated in the development of episodes of severe and disab- 
ling low back pain 3 We undertook this large meta-analysis in 
order to identify novel genetic variants associated with LDD 
and to shed light on the underlying pathology of disc 
degeneration. 

GWA data obtained using differing chip technology may be 
readily compared using imputation with HapMap. In total, 
2 552 511 overlapping markers were available in each cohort. 
We identified four markers having significant association with 
the LDD phenotype (p<5xl0~ 8 ). There was similarity in the 
results obtained with and without adjustment for the covari- 
ates age and sex. A total of 26 markers had p<10 -5 in both 
meta-analyses. As expected, results of the adjusted analyses 
had slightly attenuated p values (see online supplementary 
table S2) which likely reflect the confounding effect of age. In 
both analyses, there were multiple associations to the Human 
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Table 2 Results of the genome-wide association meta-analysis (unadjusted, showing those SNPs having p<10 5 ). 



MAF Imputation quality* 



SNP 


Chr 


Gene 


l\l 


RS1 


RS3 


TUK 


FHS 


GARP 


RS1 


RS3 


TUK 


FHS 


GARP 


Eff all 


P 


SE 


p Value 


rs1 7034687 


3 


NA 


4429 


0.09 


0.1 


0.05 


0.08 


NA 


0.92 


0.93 


0.88 


0.82 


NA 


C 


0.23 


0.038 


1.82E-09 


rs21 87689 


6 


NA 


4457 


0.08 


0.07 


0.06 


0.09 


NA 


0.98 


0.99 


0.95 


0.93 


NA 


C 


0.23 


0.041 


2.72E-08 


rs7767277 


6 


NA 


4457 


0.08 


0.07 


0.06 


0.09 


NA 


0.98 


0.99 


0.95 


0.93 


NA 


A 


0.23 


0.041 


2.81E-08 


rs926849 


6 


PARK2 


3939 


0.31 


0.32 


NA 


0.31 


0.23 


0.98 


0.99 


NA 


0.9 


95.04 


C 


-0.13 


0.024 


3.25E-08 


rs7744666 


6 


NA 


4466 


0.1 


0.09 


0.06 


0.1 


NA 


0.99 


1 


0.97 


0.96 


NA 


C 


0.2 


0.037 


5.58E-08 


rs1 1969002 


6 


NA 


4466 


0.1 


0.09 


0.06 


0.1 


NA 


0.99 


1 


0.97 


0.96 


NA 


A 


0.2 


0.037 


5.59E-08 


rs6457690 


6 


NA 


4464 


0.1 


0.09 


0.07 


0.11 


NA 


0.98 


1 


0.96 


0.97 


NA 


A 


0.19 


0.036 


9.36E-08 


rs1 029296 


6 


NA 


4464 


0.1 


0.09 


0.07 


0.11 


NA 


0.98 


1 


0.96 


0.97 


NA 


C 


0.19 


0.036 


9.39E-08 


rs6936004 


6 


NA 


4462 


0.1 


0.09 


0.07 


0.11 


NA 


0.98 


1 


0.96 


0.97 


NA 


C 


0.19 


0.036 


1.04E-07 


rs3749982 


6 


NA 


4458 


0.1 


0.09 


0.06 


0.1 


NA 


0.99 


1 


0.96 


0.96 


NA 


A 


0.19 


0.037 


1.46E-07 


rs9469300 


6 


NA 


4482 


0.1 


0.09 


0.07 


0.1 


NA 


0.99 


1 


0.92 


0.96 


NA 


A 


0.19 


0.037 


1.47E-07 


rs1 021 4886 


6 


NA 


4479 


0.1 


0.09 


0.07 


0.11 


NA 


0.98 


1 


0.92 


0.97 


NA 


A 


0.19 


0.036 


2.32E-07 


rs1 0046257 


6 


NA 


4461 


0.1 


0.09 


0.08 


0.11 


NA 


0.98 


1 


0.96 


0.97 


NA 


A 


0.19 


0.037 


3.22E-07 


rs4875102 


8 


NA 


4608 


0.26 


0.26 


0.27 


0.25 


0.29 


0.99 


0.99 


0.95 


0.91 


97.65 


A 


-0.12 


0.024 


3.61E-07 


rs3019449 


6 


PARK2 


4636 


0.32 


0.32 


0.31 


0.31 


0.28 


0.98 


0.98 


0.98 


0.99 


97.18 


A 


-0.12 


0.023 


3.68E-07 


rs1 029295 


6 
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*r 2 from MACH for RS1, RS3, FHS and GARP; information score from IMPUTE for TUK. 
Studies contributing data are denoted. 

SNR single nucleiotide polymorphism; Chr, chromosome; N, number of subjects studied; MAF minor allele frequency; RS1, Rotterdam study cohort 1; RS3, Rotterdam study 
cohort 3; TUK, TwinsUK; FHS, Framingham Heart Study; GARP Genetics of OsteoArthrosis and Progression study; Eff all, effect allele; p, effect size; SE, SE of p. 



Leukocyte Antigen (HLA) region and to markers in PARK2 
(Parkinson protein 2, E3 ubiquitin protein ligase). Among the 
most significant findings (table 2) is SNP rs926849 that lies at 
6q25.2-27 within an intron in the PARK2 gene, a large gene of 
1.3 Mb comprising 12 exons. The SNP encodes a change of base 
from T to C and is reported to have a minor allele frequency of 
0.23-0.34 in dbSNP, which is keeping with the findings in our 
study groups (table 2, figures 3 and 4). Although this SNP has 
not been directly genotyped by any study group, estimates 



suggest imputation to be accurate for rs926849 (range 95%- 
99%, table 2). PARK2 encodes a protein called parkin, which is 
a component of a multiprotein E3 ubiquitin ligase complex 
that mediates the targeting of unwanted proteins for proteaso- 
mal degradation. This complex also controls the level of pro- 
teins involved in cell activities such as cell division and growth: 
it may be a tumour suppressor protein. Parkin is also involved 
in mitophagy: it translocates from the cytosol and promotes 
the degradation of uncoupled mitochondria. 38 Parkin is widely 
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Figure 3 Forest plot of rs926849 in PARK2 unadjusted for covariates. The contribution of the studies included in the meta-analysis is shown in this 
fixed effects model. The C allele is considered. Heterogeneity, l 2 =0%; p(Q)=0.67. TE, treatment effect; seTE, standard error treatment effect. 



expressed in solid organs as well as skeletal muscle (http://www. 
proteinatlas.org/). Mutations within PARK2 are associated with 
diverse conditions including autosomal recessive juvenile 
Parkinson's disease, Alzheimer's disease, diabetes mellitus and 
several solid tumours (reviewed in 39 ) . Parkin may account for the 
inverse relation between Parkinson's disease and cancer inci- 
dence. 40 Our findings of hypermethylation with increasing LDD 
score suggest that PARK2 expression is reduced with increasing 
disc degeneration but functional studies of intervertebral disc and 
other spine tissues are needed. 

Three further markers in the unadjusted meta-analysis had 
p<5xl0~ 8 . Marker rsl7034687 is an intergenic marker on 
chromosome 3. Based on One Thousand Genomes (1KG)/CEU 
data, it is not in LD (r 2 > 0.3) with any known gene-based 
markers. Markers rs2 187689 (supplementary figure S2) and 
rs7767277 are HLA region markers, neither of which is included 
in the 1KG pilot data. Using data from HapMap V3 (release 2), 
rs2187689 and rs776277 are in perfect LD with each other and 
in LD (r 2 = 0.76) with an intronic marker in PSMB9 (prote- 
asome (prosome, macropain) subunit, p type 9; large 



multifunctional peptidase 2). Proteasomes are distributed 
throughout eukaroytic cells at high concentration and cleave 
peptides in an ATP/ubiquitin-dependent process in a non- 
lysosomal pathway. The gene is located in the class II region of 
the MHC. Expression of the gene is induced by interferon y 
and this gene product replaces catalytic subunit 1 (proteasome 
P 6 subunit) in the immunoproteasome. 

While lumbar degeneration is not considered an inflamma- 
tory process and has not been reported to be auto-immune in 
aetiology, there is evidence of pro-inflammatory cytokine acti- 
vation in herniated lumbar discs 41 and anti-TNF has been used 
successfully to treat disc herniation-induced sciatica. 42 Of note, 
the COL11A2 gene lies 169 KB upstream from rs2187689. An 
SNP (rs2076311) within this candidate gene has been shown to 
be associated with MR determined disc signal intensity in a 
candidate gene study of Finnish male twins. 43 SNP rs2076311 
is not, however, in LD with our top hit, rs2187689 (r 2 =0.017) 
and so it seems unlikely that this collagen-encoding gene 
accounts for our observed association. Many published GWA 
studies have identified SNPs in intergenic regions and the 
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Figure 4 Regional plot of association results and recombination rates for the PARK2 gene, unadjusted results, -login, p values (y-axis) of the single 
nucleotide polymorphisms (SNPs) are shown according to their chromosomal positions (x-axis) with lead SNP shown as a purple diamond. The 
colour intensity of each symbol depicting an SNP reflects the extent of LD with the rs926849, coloured red (r 2 >0.8) through to blue (r 2 <0.2). 
Genetic recombination rates (cM/Mb), estimated using HapMap CEU samples, are shown with a light blue line. Physical positions are based on build 
36 (NCBI) of the human genome. Also shown are the relative positions of genes mapping to the region of association. Genes have been redrawn to 
show the relative positions and, therefore, the maps are not to physical scale. 
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precise role of these regions is yet to be defined. Long range 
enhancers, for example, could operate here and so an influence 
on COL11A2 expression cannot be ruled out. 

Of suggestive significance is SNP rs4802666 (p=3.76xl0~ 6 , 
adjusted meta-analysis) which lies within the MYH14 gene 
which encodes myosin, heavy chain 14, non-muscle. It is 
expressed in cell lines derived from bone (http://www. 
proteinatlas.org) and is implicated in autosomal dominant 
hearing impairment. It is of interest in LDD because it lies on 
chromosome 19 under the linkage peak we have reported in 
twins for LDD 44 and a peak reported by the Framingham 
group for hand OA. 45 As there is a known relationship between 
these two phenotypes, this region on chromosome 19 forms a 
highly plausible candidate region for OA. It is not impossible 
that a muscle-expressed protein plays a role in LDD through 
mechanisms similar to those proposed for OA, considered by 
some to be a multi-tissue syndrome rather than simply a 
disease of cartilage. 4 

The main limitation of the study is one of obtaining an 
accurate phenotype on individuals which is known to be an 
important factor in the success of GWA. 47 There is at present 
no agreed gold standard imaging method in the determination 
of LDD, although it is recognised that MRI offers the most sen- 
sitive, widely available tool. Even so, MR is relatively expensive 
and many of the largest spine cohorts in the world have plain 
radiographs, which offer more limited phenotypic information. 
The coding method applied to the imaging is also yet to be for- 
mally standardised: our interest in the individual subtraits of 
LDD led us to devise a coding method in which they were 
separated, as reported previously. 7 In order to obtain sufficient 
sample size, a number of cohorts contributed having different 
imaging methods, but traits were selected to enable comparison 
across the cohorts. Thus, study groups recoded their imaging 
where necessary to meet uniform requirements for inclusion. 
We included measures of disc height (coded 0-3) and anterior 
osteophytes in RSI, RS3, GARP and TUK (also coded 0-3) and 
posterior osteophytes in FHS (coded 0-3). These subpheno- 
types were summed over the five discs and underwent inverse 
normal transformation to give a normal distribution. A further 
limitation is that four cohorts are population samples while 
GARP is derived from OA-affected sibling pairs. We included 
GARP because it has made a contribution to similar analyses 
performed for OA 48 and, with adjustment for relatedness, pro- 
vides data comparable with other studies. While the differing 
methods of imaging provide different amounts of information 
so the LDD variable has lower mean in those cohorts with 
radiographs, the variance is comparable. Where GARP samples 
made a contribution to the meta-analysis (a number of the 
significant SNPs did not include a contribution from GARP, 
table 2), the minor allele frequency was similar to those of 
other groups. The TUK group has a disproportionate number 
of women, for historical reasons. The men were retained, 
however, as they did not differ significantly from women in 
the LDD variable or body mass index (data not shown). This 
study lacks a replication group. A second sample of similar size 
to the first is considered important to show that the findings 
of the first sample are true positives. Unfortunately there are, 
to our knowledge, no other collections of Northern Europeans 
having spine imaging which together would approach our 
sample size. There is considerable evidence in the literature 
that the genetic predisposition between Northern Europeans 
and Asians to OA is different 49 and, given the similarities 
between OA and LDD, we felt replication should be made in 
Northern Europeans. We elected to include all the subjects in 



a single, powerful study rather than split the sample and 
reduce the chances of finding significant novel loci associated 
with LDD. 

In conclusion, this is the first large-scale GW\ study of LDD 
and we have identified several novel variants in the PARK2 
gene and in PSMB9 within MHC class 2. We have shown in a 
small subset that methylation at one of the PARK2 promoters 
is associated with MRI determined LDD. Both loci merit 
further investigation to shed light on the important clinical 
endpoint of low back pain. 
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